15 research outputs found
Single spin Toffoli-Fredkin logic gate
The Toffoli–Fredkin (TF) gate is a universal reversible logic gate capable of performing logic operations without dissipating energy. Here, we show that a linear array of three quantum dots, each hosting a single electron, can realize the TF gate, if we encode logic bits in the spin polarization of the electrons and allow nearest neighbor exchange coupling. The dynamics of the TF gate is realized by selectively driving spin resonances in the coupled spin system with an acmagnetic field. The conditions for gate operation are established, and an estimate of the switching speed and gate error are provided
Conformalized Multimodal Uncertainty Regression and Reasoning
This paper introduces a lightweight uncertainty estimator capable of
predicting multimodal (disjoint) uncertainty bounds by integrating conformal
prediction with a deep-learning regressor. We specifically discuss its
application for visual odometry (VO), where environmental features such as
flying domain symmetries and sensor measurements under ambiguities and
occlusion can result in multimodal uncertainties. Our simulation results show
that uncertainty estimates in our framework adapt sample-wise against
challenging operating conditions such as pronounced noise, limited training
data, and limited parametric size of the prediction model. We also develop a
reasoning framework that leverages these robust uncertainty estimates and
incorporates optical flow-based reasoning to improve prediction prediction
accuracy. Thus, by appropriately accounting for predictive uncertainties of
data-driven learning and closing their estimation loop via rule-based
reasoning, our methodology consistently surpasses conventional deep learning
approaches on all these challenging scenarios--pronounced noise, limited
training data, and limited model size-reducing the prediction error by 2-3x
ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation
The edge processing of deep neural networks (DNNs) is becoming increasingly
important due to its ability to extract valuable information directly at the
data source to minimize latency and energy consumption. Frequency-domain model
compression, such as with the Walsh-Hadamard transform (WHT), has been
identified as an efficient alternative. However, the benefits of
frequency-domain processing are often offset by the increased
multiply-accumulate (MAC) operations required. This paper proposes a novel
approach to an energy-efficient acceleration of frequency-domain neural
networks by utilizing analog-domain frequency-based tensor transformations. Our
approach offers unique opportunities to enhance computational efficiency,
resulting in several high-level advantages, including array micro-architecture
with parallelism, ADC/DAC-free analog computations, and increased output
sparsity. Our approach achieves more compact cells by eliminating the need for
trainable parameters in the transformation matrix. Moreover, our novel array
micro-architecture enables adaptive stitching of cells column-wise and
row-wise, thereby facilitating perfect parallelism in computations.
Additionally, our scheme enables ADC/DAC-free computations by training against
highly quantized matrix-vector products, leveraging the parameter-free nature
of matrix multiplications. Another crucial aspect of our design is its ability
to handle signed-bit processing for frequency-based transformations. This leads
to increased output sparsity and reduced digitization workload. On a
1616 crossbars, for 8-bit input processing, the proposed approach
achieves the energy efficiency of 1602 tera operations per second per Watt
(TOPS/W) without early termination strategy and 5311 TOPS/W with early
termination strategy at VDD = 0.8 V
Memory-Immersed Collaborative Digitization for Area-Efficient Compute-in-Memory Deep Learning
This work discusses memory-immersed collaborative digitization among
compute-in-memory (CiM) arrays to minimize the area overheads of a conventional
analog-to-digital converter (ADC) for deep learning inference. Thereby, using
the proposed scheme, significantly more CiM arrays can be accommodated within
limited footprint designs to improve parallelism and minimize external memory
accesses. Under the digitization scheme, CiM arrays exploit their parasitic bit
lines to form a within-memory capacitive digital-to-analog converter (DAC) that
facilitates area-efficient successive approximation (SA) digitization. CiM
arrays collaborate where a proximal array digitizes the analog-domain
product-sums when an array computes the scalar product of input and weights. We
discuss various networking configurations among CiM arrays where Flash, SA, and
their hybrid digitization steps can be efficiently implemented using the
proposed memory-immersed scheme. The results are demonstrated using a 65 nm
CMOS test chip. Compared to a 40 nm-node 5-bit SAR ADC, our 65 nm design
requires 25 less area and 1.4 less energy by
leveraging in-memory computing structures. Compared to a 40 nm-node 5-bit Flash
ADC, our design requires 51 less area and 13 less
energy